NLP Applications of Sinhala: TTS & OCR
نویسندگان
چکیده
This paper brings together the practical applications and the evaluation of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and an Optical Character Recognition system for Sinhala.
منابع مشابه
Festival-si: A Sinhala Text-to-Speech System
This paper brings together the development of the first Text-to-Speech (TTS) system for Sinhala using the Festival framework and practical applications of it. Construction of a diphone database and implementation of the natural language processing modules are described. The paper also presents the development methodology of direct Sinhala Unicode text input by rewriting Letter-to-Sound rules in...
متن کاملRecognition of Printed Sinhala Characters Using Linear Symmetry
Sinhala characters used in the Sinhala script by over 70% of the 18 million population in Sri Lanka, have been descended from the ancient Brahmi script. The Sinhala alphabet consists of vowels and consonants and the consonants are modified using modifier symbols to give the required vocal sounds. In the process of developing an OCR for the Sinhala script, characters are initially recognised thr...
متن کاملText-to-Speech Synthesis for Mandarin Chinese
A Text-To-Speech (TTS) synthesizer is a computer-based system that is able to automatically read text aloud, regardless whether the text is introduced by computer input stream or a scanned input that is submitted to an optical character recognition (OCR) engine. TTS synthesis can be used in many areas, such as telecommunication services, language education, vocal monitoring, multimedia, and as ...
متن کاملLexicon and hidden Markov model-based optimisation of the recognised Sinhala script
The Brahmi descended Sinhala script is used by 75% of the 18 million population in Sri Lanka. To the best of our knowledge, none of the Brahmi descended scripts used by hundreds of millions of people in South Asia, possess commercial OCR products. In the process of implementation of an OCR system for the printed Sinhala script which is easily adoptable to similar scripts [Premaratne, L., Assabi...
متن کاملA Generative Probabilistic OCR Model for NLP Applications
In this paper, we introduce a generative probabilistic optical character recognition (OCR) model that describes an end-to-end process in the noisy channel framework, progressing from generation of true text through its transformation into the noisy output of an OCR system. The model is designed for use in error correction, with a focus on post-processing the output of black-box OCR systems in o...
متن کامل